6 research outputs found

    Taxonomy of datasets in graph learning : a data-driven approach to improve GNN benchmarking

    Full text link
    The core research of this thesis, mostly comprising chapter four, has been accepted to the Learning on Graphs (LoG) 2022 conference for a spotlight presentation as a standalone paper, under the title "Taxonomy of Benchmarks in Graph Representation Learning", and is to be published in the Proceedings of Machine Learning Research (PMLR) series. As a main author of the paper, my specific contributions to this paper cover problem formulation, design and implementation of our taxonomy framework and experimental pipeline, collation of our results and of course the writing of the article.L'apprentissage profond sur les graphes a atteint des niveaux de succès sans précédent ces dernières années grâce aux réseaux de neurones de graphes (GNN), des architectures de réseaux de neurones spécialisées qui ont sans équivoque surpassé les approches antérieurs d'apprentissage définies sur des graphes. Les GNN étendent le succès des réseaux de neurones aux données structurées en graphes en tenant compte de leur géométrie intrinsèque. Bien que des recherches approfondies aient été effectuées sur le développement de GNN avec des performances supérieures à celles des modèles références d'apprentissage de représentation graphique, les procédures d'analyse comparative actuelles sont insuffisantes pour fournir des évaluations justes et efficaces des modèles GNN. Le problème peut-être le plus répandu et en même temps le moins compris en ce qui concerne l'analyse comparative des graphiques est la "couverture de domaine": malgré le nombre croissant d'ensembles de données graphiques disponibles, la plupart d'entre eux ne fournissent pas d'informations supplémentaires et au contraire renforcent les biais potentiellement nuisibles dans le développement d’un modèle GNN. Ce problème provient d'un manque de compréhension en ce qui concerne les aspects d'un modèle donné qui sont sondés par les ensembles de données de graphes. Par exemple, dans quelle mesure testent-ils la capacité d'un modèle à tirer parti de la structure du graphe par rapport aux fonctionnalités des nœuds? Ici, nous développons une approche fondée sur des principes pour taxonomiser les ensembles de données d'analyse comparative selon un "profil de sensibilité" qui est basé sur la quantité de changement de performance du GNN en raison d'une collection de perturbations graphiques. Notre analyse basée sur les données permet de mieux comprendre quelles caractéristiques des données de référence sont exploitées par les GNN. Par conséquent, notre taxonomie peut aider à la sélection et au développement de repères graphiques adéquats et à une évaluation mieux informée des futures méthodes GNN. Enfin, notre approche et notre implémentation dans le package GTaxoGym (https://github.com/G-Taxonomy-Workgroup/GTaxoGym) sont extensibles à plusieurs types de tâches de prédiction de graphes et à des futurs ensembles de données.Deep learning on graphs has attained unprecedented levels of success in recent years thanks to Graph Neural Networks (GNNs), specialized neural network architectures that have unequivocally surpassed prior graph learning approaches. GNNs extend the success of neural networks to graph-structured data by accounting for their intrinsic geometry. While extensive research has been done on developing GNNs with superior performance according to a collection of graph representation learning benchmarks, current benchmarking procedures are insufficient to provide fair and effective evaluations of GNN models. Perhaps the most prevalent and at the same time least understood problem with respect to graph benchmarking is "domain coverage": Despite the growing number of available graph datasets, most of them do not provide additional insights and on the contrary reinforce potentially harmful biases in GNN model development. This problem stems from a lack of understanding with respect to what aspects of a given model are probed by graph datasets. For example, to what extent do they test the ability of a model to leverage graph structure vs. node features? Here, we develop a principled approach to taxonomize benchmarking datasets according to a "sensitivity profile" that is based on how much GNN performance changes due to a collection of graph perturbations. Our data-driven analysis provides a deeper understanding of which benchmarking data characteristics are leveraged by GNNs. Consequently, our taxonomy can aid in selection and development of adequate graph benchmarks, and better informed evaluation of future GNN methods. Finally, our approach and implementation in the GTaxoGym package (https://github.com/G-Taxonomy-Workgroup/GTaxoGym) are extendable to multiple graph prediction task types and future datasets

    Graph Positional and Structural Encoder

    Full text link
    Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches

    Factors affecting breast cancer treatment delay in Turkey: a study from Turkish Federation of Breast Diseases Societies

    No full text
    Background: One of the most important factors in breast cancer (BC) mortality is treatment delay. The primary goal of this survey was to identify factors affecting the total delay time (TDT) in Turkish BC patients. Methods: A total of 1031 patients with BC were surveyed using a uniform questionnaire. The time between discovering the first symptom and signing up for the first medical visit (patient delay time; PDT) and the time between the first medical visit and the start of therapy (system delay time; SDT) were modelled separately with multilevel regression. Results: The mean PDT, SDT and TDT were 4.8, 10.5 and 13.8 weeks, respectively. In all, 42% of the patients had a TDT >12 weeks. Longer PDT was significantly correlated with disregarding symptoms and having age of between 30 and 39 years. Shorter PDT was characteristic of patients who: had stronger self-examination habits, received more support from family and friends and had at least secondary education. Predictors of longer SDT included disregard of symptoms, distrust in success of therapy and medical system and having PDT in excess of 4 weeks. Shorter SDT was linked to the age of >60 years. Patients who were diagnosed during a periodic check-up or opportunistic mammography displayed shorter SDT compared with those who had symptomatic BC and their first medical examination was by a surgeon. Conclusion: TDT in Turkey is long and remains a major problem. Delays can be reduced by increasing BC awareness, implementing organized population-based screening programmes and founding cancer centres

    Association of biochemical and clinical parameters with parathyroid adenoma weight. Turkish-Bulgarian endocrine and breast surgery study group, hyperparathyroidism registry study

    No full text
    Background: Primary hyperparathyroidism (pHPT) caused by a single benign parathyroid adenoma is a common endocrine disorder that is affected by regional differences. Living in different geographical regions reveals differences in the laboratory results and pathological findings, but studies on this subject are not sufficient. The article focuses on biochemical and pathological effects of geographical differences in parathyroid adenoma. In addition, the present study seeks to elaborate on treatment methods and effectiveness of screening in geographical area of Bulgaria and Turkey. Method: In this prospective study, 159 patients were included from 16 centres. Demographic characteristics, symptoms, biochemical markers and pathologic characteristics were analysed and compared between 8 different regions. Results: Patients from Turkish Black Sea had the highest median serum calcium (Ca) level, whereas patients from Eastern Turkey had the lowest median serum phosphorus (P) level. On the other hand, there was no significant difference between Ca, parathormone (PTH) and P levels according to regions. Patients from Eastern Turkey had the highest adenoma weight, while patients from Bulgaria had the lowest adenoma weight. The weight of adenoma showed statistically significant differences between regions (p < 0.001). There was a correlation between adenoma weight and serum PTH level (p = 0.05) and Ca level (p = 0.035). Conclusion: This study has provided a deeper insight into the effect of the regional differences upon clinicopathological changing and biochemical values of pHTP patients with adenoma. Awareness of regional differences will assist in biochemical screening and treatment of this patient group. (c) 2021 Asian Surgical Association and Taiwan Robotic Surgery Association. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/)

    Regional Clinical and Biochemical Differences among Patients with Primary Hyperparathyroidism

    No full text
    Background: Environmental habitat may play a role in clinical disparities of primary hyperparathyroidism (pHPT) patients. Aims: To compare preoperative clinical symptoms and associated conditions and surgical findings in patients with pHPT, living in different geographical regions from the Black Sea, Mediterranean and Anatolia regions. Study Design: Retrospective, clinical-based multi-centric study of 694 patients with pHPT. Methods: Patients from 23 centers and 8 different geographical regions were included. Data related to baseline demographics, clinical, pathologic and treatment characteristics of 8 regions were collected and included age, gender, residential data, symptoms, history of fracture, existence of brown tumor, serum total Ca and p levels, serum parathormone (PTH) levels, serum 25-OH vitamin D levels, bone mineral density, size of the resected abnormal parathyroid gland(s), histology, as well as the presence of ectopia, presence of dual adenoma, and multiple endocrine neoplasia (MEN)- or familial-related disease. Results: The median age was 54. Asymptomatic patient rate was 25%. The median PTH level was 232 pg/mL and serum total Ca was 11.4 mg/dL. Eighty-seven percent of patients had an adenoma and 90% of these had a single adenoma. Hyperplasia was detected in 79 patients and cancer in 9 patients. The median adenoma size was 16 mm. Significant parameters differing between regions were preoperative symptoms, serum Ca and p levels, and adenoma size. All patients from South-East Anatolia were symptomatic, while the lowest p values were reported from East Anatolia and the largest adenoma size, as well as highest Ca levels, were from Bulgaria. Conclusion: Habitat conditions vary between geographical regions. This affects the clinicopathological features of patients with pHP
    corecore